Supporting Text for “Accurate Computation of Survival Statistics in Genome-wide Studies”
نویسندگان
چکیده
Model Suppose a set G of genes was sequenced in a collection P of patients, all of whom have the same disease. Each sequenced gene2 g ∈ G partitions the set of patients into two subsets: the P(g), with patients with a mutation in g, and the P̄(g), with patients with no mutation in g. The goal is to identify genes whose mutational status is highly correlated with the survival time, in the sense that the survival distribution of patients in P(g) is different from the survival distribution of patients in P̄(g). A key challenge in survival analysis is dealing with censored patients whose exact survival time is unknown. Censoring occurs for a variety of reasons, but the most common is that the study only lasts for a finite amount of time, and some fraction of patients remain alive at the conclusion of the study. In addition, during the course of the study patients may leave the study for a variety of reasons, that are unrelated to their treatment or disease state. The censored survival time is the last time the patient was observed in the study, which is a lower bound for the patient’s survival time3. Survival analysis assumes that censoring is non informative, i.e. the event that a patient is censored is independent of the patient’s survival beyond the censoring time. The log-rank test [1] (or family of tests) is the most commonly used non-parametric test for comparing the survival distribution of two or more populations with data subject to censoring. The advantage of this test is that it includes the censored data in its statistic, rather then removing it from the data. Since a large fraction of patients may be censored (e.g., up to 94% in the data below), it is not desirable to remove this “missing data” from consideration. In the section below, we describe two different versions of the log-rank test, the conditional log-rank and the permutational log-rank test.
منابع مشابه
Accurate Computation of Survival Statistics in Genome-Wide Studies
A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a...
متن کاملFast and Accurate False Positive Control in Genome-wide Association Studies
Genome-wide disease association studies commonly involve simultaneous testing of millions of single nucleotide polymorphisms (SNP). The SNP-based association tests are often highly correlated due to linkage disequilibrium (LD) among the SNPs. Simple Bonferroni corrections for multiple comparisons are often too conservative. Permutation tests, which are often used in practice, are on the other h...
متن کاملGenome Wide Association Studies, Next Generation Sequencing and Their Application in Animal Breeding and Genetics: A Review
Recently genetic studies have been revolutionized by next generation sequencing (NGS) technology, and it is expected that the use of this technology will largely eliminate defects in the methods of association studies. The NGS technology is becoming the premier tool in genetics. However, at the moment the use of this method is limited especially in the livestock due to high cost and computation...
متن کاملFast computation for genome-wide association studies using boosted one-step statistics
MOTIVATION Statistical analyses of genome-wide association studies (GWAS) require fitting large numbers of very similar regression models, each with low statistical power. Taking advantage of repeated observations or correlated phenotypes can increase this statistical power, but fitting the more complicated models required can make computation impractical. RESULTS In this article, we present ...
متن کاملGenome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis
Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...
متن کامل